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Genome-wide meta-analyses of smoking behaviors in 
African Americans 
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J J Hu^\ SC Hunt^^ SA lngles^ EM John^^'^^ R Kittles^^ S Kolb^^ LN Kolonel^^ L Le Marchand^^ Y Liu^^ KK Lohman^ 
B McKnight^^ RC Millikan^°, A Murphy^\ C Neslund-Dudas^^ S Nyante^°, M Press^ BM Psaty^^'^^ DC Rao^^ S Redline^^ 
JL Rodriguez-Gil^\ BA Rybicki^^ LB Signorello22'2^ AB Singleton^^ J Smoller'^^ B Snively^ B Spring^ JL Stanford^^ SS Strom^^ 
GE Swan\ KD Taylor^^ MJ Thun^^ AF Wilson^^ JS Witte^^ Y Yamamura^^ LR Yanek^^ K Yu^" W Zheng2^ RG Ziegler^^, 
AB Zonderman^°, E Jorgenson®'^^, CA Haiman^'^^ and H Furberg^°'^^ 

The identification and exploration of genetic loci that influence smoking behaviors have been conducted primarily in populations 
of the European ancestry. Here we report results of the first genome-wide association study meta-analysis of smoking behavior 
in African Americans in the Study of Tobacco in Minority Populations Genetics Consortium (n = 32 389). We identified one non- 
coding single-nucleotide polymorphism (SNP; rs2036527[A]) on chromosome 15q25.1 associated with smoking quantity 
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(cigarettes per day), which exceeded genome-wide significance (p = 0.040, s.e. = 0.007, P= 1.84 x 10 ). This variant is present 
in the 5^-distal enhancer region of the CHRNA5 gene and defines the primary index signal reported in studies of the European 
ancestry. No other SNP reached genome-wide significance for smol<ing initiation (SI, ever vs never smoking), age of SI, or 
smol<ing cessation (SC, former vs current smol<ing). Informative associations that approached genome-wide significance 
included three modestly correlated variants, at 15q25.1 within PSMA4, CHRNASand CHRNA3iot smoking quantity, which are 
associated with a second signal previously reported in studies in European ancestry populations, and a signal represented by 
three SNPs in the SP0CK2 gene on chr10q22.1. The association at 15q25.1 confirms this region as an important susceptibility 
locus for smoking quantity in men and women of African ancestry. Larger studies will be needed to validate the suggestive loci 
that did not reach genome-wide significance and further elucidate the contribution of genetic variation to disparities in cigarette 
consumption, SC and smoking-attributable disease between African Americans and European Americans. 
Translational Psychiatry (20)2) 2, e119; doi:10.1038/tp.2012.41; published online 22 May 2012 



Introduction 

Smoking is influenced by genetic and environmental fac- 
tors.'''^ Genome-wide association studies (GWAS) in popula- 
tions of European ancestry have identified genetic variation 
associated with smoking behaviors, including smoking initia- 
tion (SI), smoking quantity and smoking cessation (SO). An 
initial, large (n = 10 995) GWAS of smoking quantity identified 
associations with genetic variants in the nicotinic acetylcho- 
line receptor a5, a3 and p4 subunit cluster on chromosome 
15q25.1.^ Genome-wide meta-analyses in three large con- 
sortia (n = 74 053, 31 226 and 41 150) of smoking behaviors 
confirmed the finding at 15q25.1 and refined the association 
signal within the locus. "^"^ Additional studies in diverse 
populations also have revealed independent signals in this 
region, suggesting multiple biologically functional variants/'^ 
This locus has also been reported as a susceptibility locus for 
lung cancer; however, whether this effect is independent of 
smoking behavior is unclear.^ ''^ Additional regions have been 
identified for smoking quantity {CHRNB3I CHRNA6) on 8p1 1 
CYP2A6 on 19q13^'^ and LOC1 001 88947 on 10q25^), SI 
(BD/VF on 11 pi 3)^ and SC (DBH on 9q34).^ 

To date, all published GWAS for smoking behaviors have 
been conducted in populations of European descent.'''' 
Conducting GWAS in non-European populations, such as 
African ancestry populations is important because of their 
greater genetic diversity and population differences in disease 
allele frequency, linkage disequilibrium patterns and pheno- 
type prevalence. For smoking behaviors, the need for 
GWAS in African American populations is particularly clear; 
African Americans, on average, initiate smoking later, smoke 
fewer cigarettes per day, yet are less likely to successfully 
quit smoking. Further, they have a higher risk of smoking- 
related lung cancer than many other populations.^^ Ethnic 
differences in the clearance of nicotine, cotinine and other 
metabolites have been shown to contribute to the observed 
differences in cigarette consumption across populations, 
mediated in part by genetic variants in the cytochrome p450 
2A6 gene.^"^-^^ 

The genetic architecture of smoking-related traits is not well 
described in non-European ancestral groups, but there is 
evidence that genetic determinants have important implica- 
tions for multiple addictive behaviors in populations globally.''^ 
We established the Study of Tobacco in Minority Populations 
(STOMP) Genetics Consortium, which represents 13 GWAS 
studies of men and women of African ancestry, to search for 
risk loci for smoking behaviors in this population. 



Materials and methods 

Study description. The STOMP Genetics Consortium is 
comprised of the following studies: the Women's Health 
Initiative SNP Health Association Resource (n = 8208), the 
African American GWAS consortia of Breast Cancer 
(n = 5061) and Prostate Cancer (n = 5556), the Candidate 
Gene Association Resource Consortium (including the 
Atherosclerosis Risk in Communities (n = 2916) study, the 
Cleveland Family Study (n = 632), the Coronary Artery Risk 
Development in Young Adults (n = 953) study, the Jackson 
Heart Study (n = 2145) and the Multi-Ethnic Study of 
Atherosclerosis (n=1646)), the Cardiovascular Health 
Study (n = 801), the Healthy Aging in Neighborhoods 
across the Life Span Study (n = 918), the Health ABC 
Study (n = 1137), the Genetic Study of Atherosclerosis Risk 
(n=1175) and the Hypertension Genetic Epidemiology 
Network (n=1241). A description of each participating 
study as well as details regarding the measurement and 
collection of smoking data for each study are provided in 
Supplementary Materials. All studies had local Institutional 
Review Board approval for the present study and all 
participants provided written informed consent. 

Smoking phenotypes. We examined four smoking 
phenotypes previously shown to be heritable in the African 
and European ancestry samples^ ^"^^ and used in prior 
GWAS of smoking behavior."*"^ SI contrasted individuals who 
reported having smoked 100 cigarettes during their lifetime 
(ever smokers) with those who reported having smoked 
between 0 and 99 cigarettes during their lifetime (never 
smokers), consistent with the Centers for Disease Control 
classification.^^ Among smokers, the age of SI (AOI) 
represented the age individuals began smoking. Some 
studies captured the age they first tried smoking, whereas 
others collected the age they began smoking regularly. As 
prior research suggests similar heritabilities and high genetic 
correlation between these phenotypes, we justified using 
either value in a general assessment of AOI. Similarly, for 
cigarettes smoked per day (CPD), some studies collected 
maximum CPD, whereas others collected average CPD. 
Longitudinal twin data suggests a high correlation between 
these variables over time, which supported using either value 
in our analyses. For studies that collected CPD as ranges, 
the mid-point of the interval was used as the data point; for 
example, individuals who reported the CPD category 0-4 
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were assigned a CPD value of 2. SC contrasted individuals 
who had quit smoking at interview (former smokers) with 
those who were current smokers. As relapse to smoking is 
highest within the first year after quitting, we tried to reduce 
misclassification by excluding smokers who quit within 1 year 
of interview within studies with available data. Table 1 
presents distributions of smoking phenotypes across 
participating studies. 

Genotyping and quality control. Each study performed its 
own genotyping using lllumina (San Diego, CA, USA) or 
Affymetrix GWAS arrays (Santa Clara, CA, USA). 
Supplementary Tables 1 and 2 present the details of the 
arrays, genotyping quality control procedures and sample 
exclusions (i.e., sex mismatch, call rate failure, relatedness, 
missing smoking and ancestry outliers) for each study. The 
quality control filters applied by each study were comparable; 
single-nucleotide polymorphisms (SNPs) with call rates 
<95% (except the Genetic Study of Atherosclerosis Risk, 
< 90%), < 1 % minor allele frequency or significant (P< 1 0~^) 
departure from Hardy-Weinberg equilibrium were excluded, 
as were individuals with excess autosomal heterozygosity, 
mismatch between reported and genetically determined sex, 
or first- or second-degree relatedness. Genome-wide 
imputation^"* was carried out in each study using the 
software MACH, IMPUTE, BEAGLE or BIMBAM vO.99,^^"^^ 
to infer genotypes for SNPs that were not genotyped directly 
on the platforms, but were genotyped on the HapMap phase 
2 CEU and YRI samples.^^ SNPs with imputation quality 
scores <0.5 were excluded. 

Data analyses. Study-specific GWAS analysis. Each study 
conducted uniform cross-sectional analyses for each 
smoking phenotype using an additive genetic model. 
Logistic regression was used for discrete traits (SI and SC) 
and linear regression was used for quantitative traits (CPD 



and AOI). Continuous, quantitative traits were normalized by 
transformation to Z scores, owing to heavy tails and non- 
normality. Outliers were removed within each study, where 
abs (Z)>2. Link (Y) = Z scores were fit using ordinary least 
squares regression. To investigate potential sources of 
heterogeneity across studies, we examined the distribution 
of African ancestry in each cohort (Supplementary Figure 1). 
To account for population stratification and admixture, all 
studies adjusted for an appropriate number of eigenvectors^'' ° 
from a study-specific principal components analysis.^"* In 
addition, study-specific analyses included adjustment for age 
and case status or study site, when appropriate. Genomic 
control inflation factors were computed using standard 
methods.^^'^^ 

IVIeta-analyses of GWAS results. We performed fixed-effect 
meta-analysis for each smoking phenotype by computing 
pooled inverse-variance-weighted ^^-coefficients, s.e. and Z 
scores for each SNP.^^ All GWAS results were corrected via 
genomic control before the meta-analysis. The study-specific 
lambda values utilized in this step ranged from 1 .01 to 1 .08 for 
SI (Supplementary Table 1). Heterogeneity across studies 
was investigated using the F statistic.^^ The results presented 
herein are corrected by a second GC correction based on X of 
the meta-analyses (A < 1.02). A significance threshold of 
P<5x 10~^ was considered to indicate genome-wide sig- 
nificance. Linkage disequilibrium statistics for the largest of 
the STOMP cohorts (Women's Health Initiative, n = 8208) 
were calculated using DPRIME (http://www.phs.wfubmc.edu/ 
public/bios/gene/downloads.cfm). Linkage disequilibrium sta- 
tistics for CEU and YRI were obtained from HapMap phase 2 33. 
Statistical power analysis was performed using QUANTO.^^ 

Results 

The meta-analysis included 32 389 genotyped men and 
women of African ancestry from 13 studies with sample sizes 
ranging from n = 632 to n = 8208 (Table 1 ). Our meta-analysis 



Table 1 Descriptive cliaracteristics of tine 13 studies participating in tine STOMP Consortium 


Study 


N(% female) 


Age, mean (s.d.f 


Ever smokers (%) 


CPD, mean (s.d.f 


>10f , mean (s.d.f 


Former smokers (%f 


AABO 


5061 (100) 


56.6 (12.6) 


47.2 


11.9 (8.4) 


23.3 (9.0) 


58.8 


AAPO 


5556 (0) 


63.7 (9.6) 


68.7 


14.6 (9.9) 


23.2 (9.0) 


64.9 


CHS 


801 (63.2) 


72.9 (5.6) 


51.2 


13.9(11.2) 


19.0 (5.2) 


66.8 


CARe 














ARIC 


2916 (61.2) 


54.1 (5.7) 


52.2 


14.4 (9.8) 


19.5 (6.4) 


28.1 


CARDIA 


953 (61.4) 


24.4 (3.8) 


39.2 


11.8 (8.7) 


17.3 (5.1) 


4.6 


CFS 


632 (59.0) 


35.5(19.8) 


45.1 


13.1 (10.3) 


19.0 (5.5) 


13.3 


JHS 


2145 (60.7) 


55.2 (12.8) 


33.2 


14.9 (10.8) 


19.3 (5.7) 


17.0 


MESA 


1646 (54.7) 


62.2 (10.1) 


53.5 


14.6 (18.2) 


18.3 (5.4) 


35.0 


GeneSTAR 


1175 (61.7) 


47.4(12.3) 


57.2 


11.5 (10.3) 


18.3 (5.4) 


44.0 


HANDLS 


918 (54.5) 


48.6 (9.0) 


65.4 


15.7 (32.8) 


17.4 (6.2) 


29.0 


Health ABC 


1137 (57.2) 


73.4 (2.9) 


56.4 


15.7(12.6) 


19.5 (7.0) 


69.5 


HyperGEN 


1241 (67.3) 


45.2(13.3) 


48.7 


12.1 (9.8) 


19.5 (5.5) 


58.0 


WHI (SHARe) 


8208 (100) 


61.6 (7.0) 


50.6 


11.5 (9.5) 


20.5 (5.9) 


39.1 



Abbreviations: STOMP, Study of Tobacco in Minority Populations; CPD, cigarettes smoked per day; AOI, age of smoking initiation; AABC, African American GWAS 
consortia of Breast cancer; AAPC, African American GWAS consortia of Prostate Cancer; CHS, Cardiovascular Health Study; CARe, Candidate Gene Association 
Resource; ARIC, Atherosclerosis Risk in Communities; CARDIA, Coronary Artery Risk Development in Young Adults; CFS, Cleveland Family Study; JHS, Jackson 
Heart Study; MESA, Multi-Ethnic Study of Atherosclerosis; GeneSTAR, GeneticStudy of Atherosclerosis Risk; HANDLS, Healthy Aging in Neighborhoods across the 
Life Span Study; HyperGEN, Hypertension Genetic Epidemiology Network; WHI, Women's Health Initiative; SHARe, SNP Health Association Resource. Descriptive 
statistics for smoking behaviors included ever smokers only. 
^Age in years. "^Calculated among ever smokers. 
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sample was 66.1% female, the mean age when smoking 
information was collected ranged from 35.5 to 73.4 years, and 
52.7% were ever smokers. Among smokers, mean CPD 
ranged from 11 .5 to 15.7, the mean AOI ranged from 17.3 to 
23.3 years, and 44.8% were former smokers. 

Sample sizes for the four smoking phenotype analyses (i.e., 
with complete genotype and phenotype data) were n = 32 389 
for SI, n= 16877 for AOI, n= 15547 for CPD and n= 16215 
for SC. Manhattan plots for the four smoking phenotypes after 
double-GC scaling are shown in Figure 1 . In the entire analysis, 
only one SNP, rs2036527, achieved genome-wide significance 
for one trait, CPD (^5 = 0.04, s.e. = 0.007, P=1.84x10~^ 
P = 4^.6%, Table 2; study-specific results are show in 
Supplementary Table 3). This variant is located 6246 bp 5' of 
the CHRNA5 gene on chromosome 15q25.1. We observed 
multiple SNPs with P-values of 10"^ associated with CPD: 
rs3101457, located in intron 2 (IVS2) of C1orf100on 1q44, and 
rs547843, located 63 kb 5' of a non-coding RNA sequence 
(LOC503519) on 15q12. Three highly correlated SNPs 
(/^>0.95, YRI) in the SPOCK2 gene on 10q22.1 exhibited a 
P-value of 10"'' with AOI (Table 2). The most significant 
associations for SI and SC were observed at rs566973 ( - 20 kb 
3' of CRCT1 on 1q21.3) and rs3813637 (in the 3'-untranslated 
region of C1orf49 on 1q25.2), respectively (data not shown). 

Four top SNPs associated with CPD span approximately 
1 00 kb (76.6-76.7 Mb) at 1 5q25. 1 ; from rs381 3570, located in 
the 5'-untranslated region (c.-72T>C) of PSMA4, to 
rs938682, located in IVS4 (c.378-1941C>T) of CHRNA3 
(Table 2 and Figure 2). The most significant SNP, rs2036527, 
is located between PSMA4 and CHRNA5, and is correlated 



a Ever versus never smokers 




with the index signals (rs1051730, rsl 6969968) for CPD 
reported in previous European ancestry studies. In CEU, the 
1^ is 0.84 between rs2036527 and rsl 051 730, and 0.93 
between rs2036527 and rsl 6969968. The between 
rs2036527 and 1051730 is 0.44 in YRI, and 0.502 in STOMP, 
whereas rsl 6969968 is non-polymorphic. Rs2036527 is also 
correlated with SNPs in the European Americans that tag a 
haplotype associated with increased expression of CHRNA5 
in prefrontal cortex brain samples from European Americans 
and African Americans, but is not correlated with this 
haplotype in African ancestry samples (i^ between rs2036527 
and rsl 979905 = 0.443 in CEU, 0.045 in YRI and 0.064 in 
STOMP). The additional signals at 15q25.1 with near 
genome-wide significance in our study are represented by 
rs667282, rs938682 and rs3813570, which are weakly 
correlated with rs2036527 (/^0.2 in CEU, 0.12 in YRI and 
0.084 in STOMP). These three SNPs are correlated with each 
other (a^0.60 in CEU and 0.32 in YRI) as well as with rs578776 
and other SNPs at 15q25.1 that define a signal for smoking 
intensity in the European ancestry populations that is 
independent of rs2036527.^ However, when conditioning on 
rs2036527 in the four largest study populations in our sample 
(the African American GWAS consortia of Prostate Cancer, 
African American GWAS consortia of Breast Cancer, Candi- 
date Gene Association Resource and Women's Health 
Initiative; n=13113), the association between these three 
SNPs and CPD diminished (P-values of 10~^ after condition- 
ing on rs2036527; Supplementary Figure 2). Assuming the 
GWAS arrays utilized in this study provide adequate coverage 
of common alleles at 15q25.1, this suggests there are not 



C Cigarettes per day 




Former versus current smokery 




Figure 1 Double genomic control (GC)-corrected Manhattan plots showing significance of association of all single-nucleotide polymorphisms (SNPs) for four smoking 
phenotypes. (a-d). SNPs plotted on the x axis according to their position on each chromosome against, on the y axis (shown as -loglO P-value), the association with (a) 
smoking initiation (SI, ever vs never smokers), (b) age of SI, (c) cigarettes smoked per day, and (d) smoking cessation (former vs current smokers). Dotted red line indicates 
genome-wide significance threshold of P<5 x 10"^. 
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Table 2 SNPs with meta-analytic P-values of < 1 x 10^ for CPD and AOI 



Phenotype 


SNP 


Chromosome 
(bp position) 


Nearby 
genes 


Alleles* 


Coded 
AF 


Sample 
size (N) 


P 


s.e. 


P -value 






CPD 


rs2036527 


15(76638670) 


CHRNA5 


A/G 


0.22 


15 554 


0.040 


0.007 


1.84x10 


8 


41.6 


CPD 


rs667282 


15 (76650527) 


CHRNA5 


C/T 


0.29 


15 536 


0.033 


0.006 


1.81 X 10" 


-7 


21.7 


CPD 


rs3101457 


1 (242599837) 


C1orf100 


A/G 


0.75 


15513 


0.041 


0.008 


2.63x 10" 


-7 


1.1 


CPD 


rs938682 


15 (76683602) 


CHRNA3 


A/G 


0.71 


15 475 


0.033 


0.006 


3.75x 10" 


-7 


17.4 


CPD 


rs547843 


15 (23975140) 


LOC503519 


C/G 


0.65 


12 701 


-0.035 


0.007 


6.16X 10" 


-7 


24.2 


CPD 


rs38 13570 


15(76619887) 


PSMA4 


C/T 


0.26 


15 543 


0.033 


0.007 


9.85X 10" 


-7 


0.0 


AOI 


rs1678618 


10 (73476294) 


SPOCK2 


A/G 


0.74 


16 874 


-0.060 


0.012 


8.25x 10" 


-7 


0.0 


AOI 


rs1 245577 


10 (73480920) 


SPOCK2 


C/G 


0.26 


16 877 


0.060 


0.012 


8.30x 10" 


-7 


2.6 


AOI 


rs1612028 


10(73475296) 


SPOCK2 


C/G 


0.75 


16798 


-0.060 


0.012 


9.28 X 10" 


-7 


6.3 



Abbreviations: AF, allele frequency; AOI, age of smoking initiation; CPD, cigarettes smoked per day; SNP, single-nucleotide polymorphism. 

First named allele is coded allele. Coded AF refers to the allele analyzed as the predictor allele; it is not necessarily the minor allele. All SNPs coded to NCBI Build 36/ 

UCSC hg1 8 forward strand. One SNP (rs2036527) highlighted in bold text achieved genome-wide significance. 



multiple independent signals for CPD in this region in 
African Americans or the frequencies of the functional alleles 
and/or their effect sizes are much smaller than the signal 
defined by rs2036527. 

Supplementary Table 4 presents how the variants asso- 
ciated with smoking behaviors in European ancestry 
populations performed in STOMP (rs1051730 in CHRNA3; 
rsl 6969968 in CHRNA5; rsl 329650 and rsl 028936 in 
LOC100188947; rs3733829 in EGLN2, near CYP2A6; 
rs6265, rsl 01 3443, rs4923457, rs4923460, rs4074134, 
rsl 3041 00, rs6484320 and rs879048 in BDNF; and 
rs3025343, near DBhi). We observed modest nominally 
statistically significant associations for CPD with rsl 051 730 
(P= 0.0079) and rs1 6969968 (P= 0.027), and for SO with 
rs3025343(P=0.03). 

Discussion 

Investigating whether there are genetic variants associated 
with smoking behavior among African Americans is important, 
given that smoking prevalence and smoking-attributable 
mortality differ by race/ethnicity. Smoking prevalence and 
smoking intensity are lower for African Americans than 
European Americans, yet African Americans are less likely 
to successfully quit smoking.'^'' 

To our knowledge, this is the first meta-analysis of GWAS 
data for smoking behaviors in African Americans. The single 
genome-wide significant association we observed between 
rs2036527 and CPD is the same signal that was reported 
previously at 15q25.1 for nicotine dependence, smoking 
intensity and lung cancer in European ancestry sam- 
ples.'^"^'^^''^^ The strong association that we found for this 
SNP supports studies suggesting that it is highly correlated 
with the functional allele(s) in populations of African ancestry. 
The fact that we did not observe a strong second association 
signal in this region after conditioning on rs2036527 suggests 
that rs2036527 and correlated SNPs in the African ancestry 
populations may define a single common haplotype at 
chr15q25.1 with sufficient effect size to be detected 
in our sample. After back transformation of the beta estimate, 
mean CPD values for each rs2036527 genotype were 
14.6 for AA, 13.5 for AG and 12.8 for GG, suggesting that 



there is an increase of less than one cigarette smoked per day 
for each copy of the A allele. This SNP accounted 
for approximately 0.20% of the phenotypic variance of CPD 
in our sample. This effect is similar to that reported for 
rs1051730, which is correlated with rs2036527, where each 
copy of the rs1051730 A allele corresponds to a approxi- 
mately one CPD increase and accounts for 0.5% of the 
phenotypic variance in smoking quantity in populations of 
European ancestry. 

A study of CHRNA5 knock-out mice showed that re- 
expressing this gene in the medial habenula, which extends 
projections to a brain region shown to mediate nicotine 
withdrawal, "^"^ abolished the inhibitory effects of nicotine while 
maintaining the reinforcing effects of nicotine."^^ In afunctional 
magnetic resonance study of smokers, genetic variation in 
C/-/PA//A5 appeared to also affect reactivity to smoking cues in 
the insula, hippocampus and dorsal striatum, regions im- 
plicated in addictive behavior and memory.^^ Thus, it is 
biologically plausible that rs2036527, as a correlate of 
increased expression of the CHRNA5 gene, could be 
associated with smoking quantity as a consequence of 
neuro-adaptations resulting from complex interactions be- 
tween genes and environment that alter positive and negative 
reinforcement."*^ 

To our knowledge, no SNPs in the SPOCK2 gene, which 
encodes a protein that forms part of the extracellular matrix, 
have been reported previously in association with smoking 
behaviors or smoking-related cancer phenotypes. Variants at 
the SPOCK2 locus have been linked to bronchopulmonary 
dysplasia, a respiratory condition observed in premature 
infants'*^ that has been linked to intrauterine smoke expo- 
sure."*^ These variants are weakly correlated with the SNPs 
identified at this locus for AOI in Europeans (i^ < 0.25 in CEU), 
but are not correlated in the African ancestry populations 
(1^ = 0). The top SNP associated with SC (rs3813637) is 
located at 1q25 in the C1orf49 gene. This locus has been 
linked to late-onset Alzheimer's disease, but genetic variation 
at this locus has not been reported in association with smoking 
behavior.^° We are not aware of any smoking-related, other 
behavioral or pathological phenotypes associated with the 
variants we detected at 1q44 (C1orf100) and 15q12 
(LOC503519) or CTCT1 for CPD. 
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Figure 2 Forest and regional plot of rs2036527 with cigarettes smoked per day 
(CPD) from meta-analyses of the Study of Tobacco in Minority Populations 
(STOMP) consortia. Forest plot showing effect sizes across studies; /^ = 41.6%. 
Regional association plot show single-nucleotide polymorphisms (SNPs) plotted by 
position on chromosome against -Iog10 P-value. Estimated recombination rates 
(from HapMap-CEU) are plotted in light blue to reflect the local linkage 
disequilibrium (LD) structure on a secondary y axis. The SNPs surrounding the 
most significant SNP (purple) are color-coded to reflect their LD with this SNP (using 
pairwise values from HapMap-CEU): orange, red; 0.6-0.8, orange; 0.6-0.8; 
green, 0.4-0.6, light blue, 0.2-0.4; dark blue, <0.2. The blue bars at the bottom of 
the plot represent the relative size and location of genes in the region. AABC, 
African American GWAS consortia of Breast cancer; AAPC, African American 
GWAS consortia of Prostate Cancer; ARIC, Atherosclerosis Risk in Communities; 
CARDIA, Coronary Artery Risk Development in Young Adults; CPS, Cleveland 
Family Study; JHS, Jackson Heart Study; MESA, Multi-Ethnic Study of 
Atherosclerosis; HANDLS, Healthy Aging in Neighborhoods across the Life Span 
Study; HYPGEN, Hypertension Genetic Epidemiology Network; WHI, Women's 
Health Initiative. 



Although this is the largest GWAS meta-analysis of 
smoking phenotypes conducted to date in men and women 
of African ancestry, statistical power was a significant 
limitation. We had 80% power (for a mean allele frequency 
of 0.15 and a of 5 x 10"^) to detect effect sizes of 1 .25 for SI, 
AOI and SC, and a ^5 of 0.15 for CPD. Notably, effect sizes for 
variants reported with many of these smoking phenotypes 
reported in the larger GWAS of the European ancestry were 
much smaller. For example, TAG, ENGAGE and Ox-GSK 
consortia reported p for SI of 0.015 for SNPs in BDA/Fand 



0.026 for rs3025343 in DBH. Thus, we cannot rule out the 
possibility of additional loci that influence smoking behavior 
among African Americans that may be detected with larger 
sample sizes. 

This analysis was limited by the fact that we were not able to 
adjust for local admixture, and the chip coverage of common 
variants ( > 5%) is less complete compared with the European 
populations,^'' which applies to most GWAS of African 
American populations. However, the use of a global adjust- 
ment for population genetic variation in the regression 
analysis using the principal components approach provided 
some measure of control for potential confounding because of 
population admixture. ^"^'^^ Additionally, we acknowledge the 
limited precision of the smoking phenotypes. Smoking 
quantity is a highly heritable trait: estimates for CPD, heavy 
versus light smoking and/or pack-years range from 40 to 70% 
heritability in the European, African and Asian ancestry twin 
and family studies. Other studies have estimated that shared 
environmental factors account for 50% or more of the 
observed variation in SI, AOI and sc.^'^^'^°'^^"^^ 

We were unable to directly assess more refined phenotypes 
and highly heritable traits such as nicotine metabolism, 
given our reliance on existing data originally collected for other 
purposes. Moreover, we were unable to examine gene x 
environment interactions using meta-GWAS analytic 
approach. Our analyses did not incorporate environmental 
covariate analyses, such as type of cigarettes smoked, 
mentholated or non-mentholated, dietary factors, socioeco- 
nomic status and other factors that might influence one or 
more of the phenotypes analyzed — data were not uniformly 
available and beyond the scope of the planned analyses we 
undertook in this discovery investigation. Future prospective 
studies with more detailed characterizations of smoking 
phenotypes and relevant environmental covariates are 
needed to identify additional variants that may be associated 
with smoking behaviors. 

In summary, collective findings from GWAS among the 
African and European ancestry populations implicate chro- 
mosome 15q25 region as the most significant for smoking 
quantity. However, for both populations, SNPs in this region 
are associated with very small changes in smoking quantity 
and explain a small proportion of the variance, which suggests 
that conventional GWAS approaches may not be adequate to 
discover the likely hundreds of variants contributing small 
increments in risks of the additive genetic effects for heritable 
traits or so-called 'missing heritability' of complex diseases.^^ 
The use of more refined, specific and harmonized phenotypes 
capturing the complex behavior of SI, trajectories of progres- 
sion and cessation, and environmental effect-modifiers are 
also needed to detect the genetic architecture of smoking 
behavior in different ancestral populations. Larger studies 
utilizing next-generation SNP arrays, whole-exome or whole- 
genome sequencing will be required to investigate lower- 
frequency variation, which may contribute to unexplained 
heritability for common traits.^° 
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